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Abstract 

Genetic diversity is a fundamental input for every plant breeding program, genetic resources conservation, and 
evolutionary studies. In situ diversity and population genetic structure of eight cultivated sorghum landrace 
populations were investigated in the center of origin, Ethiopia using seven phenotypic traits and 12 highly 
polymorphic sorghum SSR markers. In farmers' fields, DNA samples were collected using Whatman® plant saver card 
and quantitative phenotypic traits were measured from 160 individual plant samples belonging to the eight 
populations representing three diverse geographical regions. High diversity was observed among the various 
populations for the measured phenotypic traits. The 1 2 SSR loci produced a total of 1 23 alleles of which 78 
(63.41%) were rare (frequency <0.05) with an average of 10.25 alleles per polymorphic locus. The polymorphism 
information content (PIC) was in the range 0.39-0.85 showing the good discriminatory power of the SSR loci used. 
Average observed heterozygosity and gene diversity across all populations and loci ranged 0.04-0.33 and 0.41-0.87, 
respectively. Neighbor-joining and STRUCTURE analyses grouped the 160 samples from the eight populations 
differently. AMOVA showed 54.44% of the variation to be within populations, 32.76% among populations within 
regions, and 1 2.8% among the regions of origin. There was high divergence in the total populations (Fsj = 0.40) 
indicating low level of gene flow (Nm = 0.38), but high gene flow was also observed in some adjacent populations. 
The populations from Wello displayed close relationship with remote Gibe and Metekel populations indicating that 
the variation followed human migration patterns. Implications of the results for sorghum improvement and 
germplasm conservation are discussed. 
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Background 

In terms of cultivated area and total grain production, 
sorghum (Sorghum bicolor) is the fifth-most important 
cereal in the world. It serves as a staple for millions of 
people in Africa and Asia (Ejeta and Grenier 2005). 
Africa has become the leading sorghum producer in 
recent years with an average annual volume contribution 
of >25 million tons of grain and the area covered by the 
crop in this continent is larger than in other continents 
(FAOSTAT 2010). Ethiopia is the third largest producer 
of sorghum in Africa behind Nigeria and Sudan with a 
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contribution of about 12% of annual production (Wani 
et al. 2011) and the second after Sudan in the Common 
Market for Eastern and Southern Africa (COMESA) 
member countries (USAID 2010). Next to maize and tef, 
sorghum is the third-most important cereal in Ethiopia 
(CSA 2012). In Ethiopia, sorghum covers 16% of the 
total area allocated to grains (cereals, pulses, and oil 
crops) and 20% of the area covered by cereals (CSA Cen- 
tral Statistical Agency 2011). In 2012 alone, 5.2 million 
holders produced 3.9 million tons of sorghum grain on 
1.9 million hectares of land. More than 95% of this area 
was covered by landraces. Sorghum is the second most 
important crop for injera (common leavened flat bread) 
next to tef (Adugna 2012). The grain is used for the 
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preparation of traditional foods, distilled and undistilled 
beverages and the biomass is highly valued for construc- 
tion, fuel and animal feed. The crop grows almost exclu- 
sively during the main rainy season, which in most 
regions extends from March to November/December. 

Ethiopia serves as the global reservoir for sources of 
favorable genes of various crops to which it is the 
Vavilonian center of origin and diversity including sor- 
ghum [Sorghum bicolor (L.) Moench]. Ethiopian farmers 
grow mixed sorghum landraces of diverse forms in their 
fields for various local purposes. The Ethiopian sorghum 
germplasm has been highly contributing to the global 
agriculture. Singh and Axtell (1973) identified two high- 
lysine Ethiopian sorghum lines, IS11167 and IS11758. IS 
12662C (SC 171), the source of A2 cytoplasm (the sterile 
line) for the development of hybrids, which belongs to 
the Caudatum Nigricans group (Guinea race) was also 
obtained from Ethiopia (Schertz 1977). Moreover, stud- 
ies identified two sorghum lines native to Ethiopia (B35 
and E36-1) as sources of "stay-green" for drought to- 
lerance, which are currently used in marker assisted 
breeding programs (Rosenow et al. 1983; Reddy et al. 
2009). Wu et al. (2006) identified seven sorghum lines of 
Ethiopian origin to be resistant to Greenbug biotype 
I. These are: ETS2140(PI452752), ETS3447(PI455203), 
ETS3805(PI455812), ETS4159(PI456490), ETS4167 
(PI456504), ETS4565(PI457212), ETS4614-B(PI457314). 
Another example is E 35-1, a selection from the 
Ethiopian zera-zera sorghum landrace, which has now 
been introduced for direct cultivation and in the 
breeding programmes in many countries (IBC Institute of 
Biodiversity Conservation 2007). Moreover, some superior 
varieties of Ethiopian origin were released in India, Eritrea, 
Burkina Faso, Zambia, Burundi and Tanzania (Reddy et al. 
2006) showing their contribution to the economy of these 
countries. Being the center of origin and diversity for sor- 
ghum, therefore, Ethiopia may harbor unique germplasm 
that is worthy of crop improvement and conservation. 

The significance of studying the genetic diversity of 
plants is explained elsewhere (e.g., Mutegi et al. 2011). 
Over the years, a number of studies have been dealt 
with estimating genetic diversity in cultivated sorghum 
using phenotypic traits (e.g., Zongo et al. 1993; Appa 
et al. 1996; Ayana and Bekele 1998; Ayana et al. 2000; 
Dahlberg et al. 2002; Shehzad et al. 2009), Allozymes 
(Aldrich et al. 1992; Ayana et al. 2001), RAPDs (Menkir 
et al. 1997; Ayana et al. 2000; Agrama and Tuinstra 
2003; Nkongolo and Nsapato 2003; Uptmoor et al. 
2003), RFLPs (Cui et al. 1995; Yang et al. 1996; Jordan 
et al. 1998), ISSRs (Yang et al. 1996); AFLPs (Uptmoor 
et al. 2003; Menz et al. 2004; Geleta et al. 2006; Ritter 
et al. 2007), Genomic SSR markers (Dean et al. 1999; 
Ghebru et al. 2002; Agrama and Tuinstra 2003; Menz 
et al. 2004; Casa et al. 2005; Geleta et al. 2006; Barnaud 



et al. 2007; Deu et al. 2008; Wang et al. 2009; Mutegi 
et al. 2011; Cuevas and Prom 2013); EST-SSR markers 
(Ramu et al. 2013); Diversity Arrays Technology (DArT™) 
(Mace et al. 2008) and SNP markers (Wang et al. 2013; 
Morris et al. 2013). Some of these studies were based on 
global and local accessions from gene banks, while others 
were based on field collections and most of them reported 
moderate to high diversity. It should be noted that each 
of these markers has its own advantages and limita- 
tions. Moreover, some studies (e.g., Labeyrie et al. 2014) 
dealt with the influence of ethnolinguistic and cultural 
diversity on the patterns of genetic structure of sorghum 
populations. 

Phenotypic traits may not give reliable estimate of 
genetic diversity as these traits are limited in number 
and due to environmental influence (van Beuningen and 
Busch 1997). On the contrary, molecular diversity data 
can potentially bridge conservation and use when 
employed as a tool for mining germplasm collections for 
genomic regions associated with adaptive or agronomi- 
cally important traits (Casa et al. 2005). Simple sequence 
repeat (SSR) markers are among the markers of choice 
currently being used for population genetic studies due 
to their high polymorphism even between closely related 
individuals within a species (Edwards et al. 1996), trans- 
ferable between populations (Taramino and Tingey 
1996; Gupta et al. 1999), require small amount of DNA, 
high reproducibility, codominance, abundance, and fairly 
evenly distributed throughout the euchromatic region of 
the genome (e.g., Schlotterer 2004). Information on in 
situ diversity and genetic structure of cultivated sorghum 
using reliable marker systems such as SSRs while indis- 
pensible is lacking in the center of origin, Ethiopia. 
Thus, this study was designed to fill up this gap. There- 
fore, this study aimed at 1) investigating the genetic di- 
versity of sorghum landraces sampled in situ from three 
agroclimatic regions of Ethiopia using phenotypic traits 
and SSR markers; 2) investigating the factors shaping the 
population genetic structure of sorghum landraces; and 
3) suggesting measures to aid efforts of crop improve- 
ment and genetic resources conservation. 

Materials and methods 

Study areas and plant samples 

The geographical characteristics of the sample collection 
sites are presented in Table 1. One-hundred sixty plant 
samples of cultivated sorghum were collected from eight 
populations in three diverse geographical and agro- 
climatic regions of Ethiopia in October and November 
in 2009 to study in situ genetic diversity and population 
structure. Four of the eight populations were collected 
in Wello in Amhara regional state (one from south and 
3 from north administrative zones), two in Gibe river 
valley (Oromia regional state), and two in Metekel zone 
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Table 1 Geographical characteristics of the sorghum collection sites and names of the dominant cultivars 



Geographical 
zone 


Name of specific 
location 


Codes of the 
collection sites* 


Longitude (E) 


Latitude (N) 


Altitude (m) 


Name of 

dominant cultivar 


Gibe 


Gibe river bridge side 


Gibe-I(GI) 


8° 13' 


37° 34' 


1 115 


Dalecho 




Gibe-ILRI 


Gibe-2(G2) 


8° 14' 


37° 34' 


1149 


Key Mashila 


Metel<el 


Pawe settlement 
village-6 


Metekel-l(Ml) 


11° 18' 


36° 24' 


1088 


Bobe 




Mandura 


Metekel-2(M2) 


11° 05' 


36° 25' 


1404 


Bobe/Mera mixed 


Soutli Wello 


Jara Kechema 


Wello-1{W1) 


1 0° 30' 


39° 56' 


1433 


Mera 


Nortli Wello 


Abuare 


Wello-2(W2) 


12° 05' 


39° 39' 


1426 


76 Ti#23(improved) 




Kobo town side 


Wello-3(W3) 


1 2° 08' 


39° 37' 


1500 


Jigurte 




Alamata-Gerjele 


Wello-4(W4) 


12° 26' 


39° 36' 


1486 


Degalit 



*Letters and figures in parentheses are codes of the collection sites on the Ethiopian map (Figure 1). 



(in Benishangul-Gumuz regional state). The different 
landrace collections are known by different local names 
(Table 1). These regions were selected based on four 
vital reasons: 1) the sites were selected to include a 
broad swath of the range of sorghum cultivation in 
Ethiopia; 2) Sorghum is the dominant crop in these re- 
gions; 3) Wello region has been under recurrent drought 
and improved varieties have been introduced into the 
region as food and seed aid thereby inflicting risk of 
displacement of the landraces; 4) Metekel and Gibe are 
high rainfall (>1000 mm) and fertile settlement areas, 
mainly for people from Wello region due to which sor- 
ghum landraces might be displaced by landraces from 
other regions (e.g., Wello) and other crop species. On 
the other hand the Wello regions of collection are char- 
acterized by low annual rainfall (600-700 mm). All of the 
regions of collection have high temperatures. In all the 
regions of collection, long-cycle sorghum landraces are 
traditionally sown in March/ April for the main rainy 
season and harvested in November/December. Each site 
of collection was considered to represent a population, 
from which 20 plants were sampled. Each population 
was a mixture of different landraces collected from three 
to five farmers' fields. The names of the dominant land- 
races based on farmers' assignment in each site are 
presented in Table 1. Readings of the coordinates and 
altitudes of the collection sites were recorded by a 
GPSmap 60CSx Global Positioning System (GPS) (Garmin), 
which was later overlaid on to the regional map of Ethiopia 
using ArcGIS version 9.3 (Figure 1). 

DNA extraction, PCR amplification and genotyplng 

Leaf squashes were collected in situ from mature plants 
using Whatman" FTA* plant saver card. Extraction and 
purification of DNA samples were performed using a 
two-step protocol developed by the manufacturer and 
optimized for sorghum by Adugna et al. (2011). DNA 
extraction and the subsequent molecular marker analysis 
were carried out at Stanley J. Aronoff Laboratory, Ohio 



State University, Columbus, Ohio. PCR were run using 12 
sorghum microsatellite loci that were previously mapped 
(Brown et al. 1996; Taramino et al. 1997; Bhattramakki 
et al. 2000; Li et al. 2009) and represented all of the 10 sor- 
ghum linkage groups (Additional file 1: Table SI). These 
loci were selected based on their high polymorphism 
and compatibility for multiplexing. PCR followed the 
QIAGEN" multi-master mix kit protocol for SSR multi- 
plex, and forward primers were labeled with different 
fluorescent dyes: FAM (6-carboxyfluorescein), HEX 
(hexachloro-6-carboxyfluorescein), or NED (2, 7, 8'- 
benzo-5 ' -fluoro-2 ' , 4,7-trichloro-5-carboxyfluorescein) 
(PE-Applied Biosystems, Foster City, CA). PCR was 
carried out in 12 |il total volume of reaction mix con- 
taining 1 pM of each primer pair in a multiplex, 1 [il of 
template DNA, 2.6 ^1 of sterile ddH20, 6 ^il of QLA.GEN' 
Multiplex PCR 2X Master mix. Polymerase chain reac- 
tions were run in a Master cycler (Eppendorf ") with an 
initial denaturation step of 15 min at 95°C, followed by 
35 cycles of 30 sec at 94°C, 90 sec at 58°C, 60 sec at 72°C, 
30 minutes at 60°C, and held at 4°C following QIAGEN" 
protocol for microsatellite multiplexes. 

To determine SSR fragment sizes, 2 |il of the PCR 
product was diluted with 14 |^1 of ddH20 and then 2 |il 
of the diluted PCR product was added to 14 |il of 36:1 
Hi-Di-Formamide: GenScan™/350 Rox " size standard in 
a 96 well microtiter plate and was denatured at 95°C for 
5 minutes and cooled on ice for at least 5 minutes. Allele 
size scoring of the PCR fragments was done by ABI 
3100 Genetic Analyzer (DNA sequencer) and sizes were 
read using the associated GeneMapper 3.7 software (Ap- 
plied Biosystems Inc., CA, USA) and manually scored. 
To exclude the possible effects of imprecise DNA frag- 
ment sizes due to stuttering, large allele drop out, or null 
alleles on genotyping, the software AUelobin (Prasanth 
et al. 2006) was used to classify observed SSR allele sizes 
into representative discrete allele sizes using a variation 
of the least-square minimization algorithm of Idury and 
Cardon (1997). 
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Data recording and statistical analysis 
Quantitative ptienotypic measurements 

To estimate phenotypic diversity, data were measured 
from 160 cultivated S. bicolor individual plants (20 plants 
per site, which represents a population) in the field on 
seven common quantitative phenotypic traits following 
descriptors for sorghum (IBPGR/ICRISAT 1993). The 
measured traits were: head length (HDL) and width 
(HDW) (cm), flag leaf length (LL) and width (LW) (cm), 
total leaf number (LN) on main stalk, plant height (PLHT) 
(cm), and number of tillers (TIL). The quantitative phe- 
notypic data were scaled to fit a normal distribution and 
subjected to simple descriptive statistics. Pearson's coeffi- 
cients of correlations were computed between all pairs of 
traits and their significance was tested using a t-test. 
Values from the correlation matrix were used to perform 
PCA using GenStat software. 

SSR polymorpliism and analysis of genetic diversity 

To estimate the discriminatory power of the SSR loci, 
polymorphism information content (PIC) (Botstein et al. 
1980; Anderson et al. 1993) was computed using Po- 
werMarker software V3.25 (Liu and Muse 2005). The 
number and frequency of SSR alleles was also computed 
using the same software. GENEPOP 4.0 (Rousset 2008) 
was used to compute exact tests for Hardy- Weinberg 
equilibrium and for genotypic disequilibrium among 
pairs of loci. This was also complemented with the HW- 
QuickCheck computer program (Kalinowski 2006). Nei's 



heterozygosity estimates (Hq, Hj, and Ht) were com- 
puted using FSTAT software (Goudet 2002). Allelic rich- 
ness (Rs) and private allelic richness (Rp) were computed 
using the rarefaction method (Hulbert 1971) imple- 
mented in HP-Rare 1.1 software (Kalinowski 2005). 
Significance of differences in the overall gene diversity, 
allelic richness and private allelic richness between 
populations and among the regions of collection was 
tested using a nonparametric Wilcoxon signed ranks test 
(Wilcoxon 1945) implemented in SPSS Statistics soft- 
ware release 17. 

Population structure and gene flow 

To estimate the components of variance among regions 
of collection, and among and within populations, ana- 
lysis of molecular variance (AMOVA) was computed 
using Arlequin v 3.1 software (Excoffier et al. 2005). To 
investigate population differentiation, Wright (1951) fix- 
ation index (Fst) of the total populations and pair wise 
FsT among all-pairs of populations were computed using 
FSTAT software (Goudet 2002) and significance was 
tested based on 10000 bootstraps. Gene flow was esti- 
mated using indirect method based on the number of 
migrants per generation (Nm) as (1-Fst)/4Fst- Shared 
alleles distance matrix (Jin and Chakraborty 1994) was 
used to construct Neighbor-joining dendrogram for the 
160 samples belonging to the eight populations using 
PowerMarker (Liu and Muse 2005), and the resulting tree 
was viewed using TreeView 1.6.6 (Page 2001, available at 
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http://taxonomy.zoology.gla.ac.uk/rod/rod.html). Further, 
the pattern of population structure and detection of 
admixture was visualized using a Bayesian model based 
clustering method implemented in STRUCTURE software, 
Version 2.2 (Pritchard et al. 2000). For this, two separate 
analyses were run with and without prior information 
about the populations. The first was done by assigning the 
site of collection as the putative population origin for each 
individual and the second run was without giving such 
information and letting the STRUCTURE program assign 
each individual into a population. The admixture model 
with correlated allele frequencies was used as suggested in 
the manual. A burn-in period of 10,000 was used followed 
by 10,000 Markov Chain Monte Carlo (MCMC) replica- 
tions for data collection for K = 1 to K = 8 groups. For 
each K value, 10 replicates were run. This procedure clus- 
ters individuals into populations and estimates the propor- 
tion of membership in each population for each individual 
(Falush et al. 2003). The optimum number of clusters was 
predicted between K = 1 and K = 8 following the simu- 
lation method of Evanno et al. (2005) using the web based 
software STRUCTURE HARVESTER vO.6.92 (Earl and 
von Holdt 2012). 

To study the pattern of gene flow, Slatkin's Fst matrix 
was first converted into Rousset (1997) genetic distance 
as Fst/(1-Fst) matrix. The geographic distance among 
the collection sites was computed from geographical 
coordinates marked with the aid of GPS using the web 
based Geographic Distance Matrix Generator (GDMG) 
version 1.2.3 software of the American Museum of 



Natural History, Center for Biodiversity and Conser- 
vation (http:/ /biodiversityinformatics. amnh.org/ open_ 
source/gdmg/index.php). Later, the correlation between 
Rousset's genetic distance matrix and the geographic 
distance matrix of the collection sites was calculated 
using the web based program IBDWS version 3.23 
(available at http://ibdws.sdsu.edu/~ibdws/). Significance 
of the correlation was tested using Mantel (1967) test. 
Moreover, analysis of reduced major axis (RMA) regres- 
sion (Hellberg 1994) was done to calculate intercept and 
slope of genetic and geographic distance matrices for 
inference of isolation by distance. 



Results 

Diversity of quantitative phenotypic traits 

Considerable variation was observed among the popula- 
tions for the measured quantitative phenotypic traits. 
The number of tillers was in the range of zero to five. 
Head length was as small as 11 cm and as large as 
46 cm in some cultivars and head width was in the range 
of five to 40 cm with mean 12.1 cm. Plant height was in 
the range of 147 cm (in an improved lowland variety, 
76T1#23, coded Wello-2) to 470 cm (Metekel-1) aver- 
aging 289 cm. Six of the eight populations showed an 
average height of greater than 3 m. Leaf width ranged 
from 4.4 cm (Wello-2) to 12.5 cm (Metekel-1). Leaf 
length was also in the range of 42 cm to 100 cm and leaf 
number was in the range of six (Wello-2) to 24 (Wello-4) 
(Table 2). 



Table 2 Simple descriptive statistics and principal component factor loadings of the measured quantitative 



phenotypic traits 


Population name 


TIL 


HDL 


HDW 


PLHT 


LW 


LL 


LN 


Gibe-1 


0.6 


31.5 


17.3 


324.0 


9.5 


77.1 


17.0 


Gibe-2 


1.0 


36.1 


14.5 


315.3 


84 


77.7 


17.0 


Metekel-1 


0.5 


29.6 


124 


3594 


8.1 


75.6 


16.0 


Metekel-2 


0.1 


32.2 


16.0 


322.0 


8.0 


74.2 


14.6 


Wello-1 


1.0 


24.4 


10.2 


301.5 


8.0 


63.1 


16.1 


Wello-2 


0.1 


21.1 


8.6 


1664 


6.0 


63.6 


7.7 


Wello-3 


0.3 


27.2 


10.2 


221.0 


8.6 


68.3 


11.8 


Wello-4 


0.9 


15.4 


7.9 


304.9 


8.9 


63.1 


19.7 


Total population 
















Minimum 


0 


11 


5 


147 


44 


42 


6 


Maximum 


5 


46 


40 


470 


12.5 


100 


24 


Mean 


0.5 


27.2 


12.1 


289.3 


8.2 


70.3 


15.0 


±SE 


0.073 


0.590 


0.421 


5.619 


0.115 


0.834 


0311 


Factor loadings 
















PCI 


-0.21 1 


-0.285 


-0297 


-0.445 


-0.350 


-0.277 


-0.383 


PC2 


0,555 


-0.295 


-0.302 


-0.014 


-0.092 


-0.362 


0.211 


PCS 


0.348 


0.384 


0.172 


-0.359 


0.131 


0403 


-0381 
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Correlation was significant in all pairs of characters 
(p < 0.05), except between number of tillers and head 
length and width, and leaf length; between head length 
and leaf number, and between leaf length and leaf num- 
ber. Plant height and leaf width had highly significant 
positive correlation with the remaining quantitative phe- 
notypic traits. The first three principal component axes 
explained 80.53% of the total variation. Plant height con- 
tributed the largest factor loadings (0.44) for PCAl. 
PCA2 is mosdy influenced by number of tillers per plant 
(0.56). For PCA3 leaf length contributed the largest 
share of the variation (0.40). Figure 2 shows the pattern 
of phenotypic diversity in the 160 plant samples from 
the eight populations based on the first two principal 
components. Four groups/ clusters are clearly observed 
in this biplot. Cluster I consisted of W2 populations. 
Cluster II composed of individuals from Wello popula- 
tions, Wl and W4. Cluster III was dominated by Gibe 
populations, Gl and G2 with some individuals from Ml 
population with similar phenotypes. Cluster IV was 
mainly composed of individuals from W3. Metekel 
populations. Ml and M2 had individuals represented in 
all of the clusters. 

SSR polymorphism 

Availability of alleles in each locus (the proportion of 
loci without missing alleles) ranged from 0.93 (Sb4-121) 
to 1.0 (Sb5-206, Sbl-1, Sb6-34, and Sb4-72) with mean 
0.97. All of the 12 SSR loci were highly polymorphic 
with PIC values ranging from 0.38 to 0.85 (mean = 0.62) 
(Table 3). All except two SSR loci had PIC values greater 
than or equal to 0.5. They produced a total of 123 alleles 
of which 78 (63.4%) were rare (with frequency < 0.05). 
The number of alleles produced per polymorphic locus 
ranged from 3 to 27 with an average of 10.25. The effect- 
ive number of alleles was also in the range of 1.7 to 7.5 



(Table 3). The frequency of the major allele was in the 
range 0.24-0.75 with a mean of 0.47. A comparison of 
SSR size ranges from the previously published reports and 
observed in the present study is presented in Additional 
file 1: Table SI. Tests for Hardy- Weinberg equilibrium 
(HWE) for all loci and all populations revealed that they 
did not significantly deviate from HWE. 



Genetic diversity 

The values of the various genetic diversity indices of the 
eight populations are presented in Table 4. Average ob- 
served heterozygosity (Hq) was in the range of 0.05-0.32 
with mean 0.13 across all loci. Gene diversity was the 
lowest in Gibe-1 (Hg = 0.20) population and the highest 
in Wello-2 (He = 0.70) population and its value averaged 
over all populations and loci was 0.67 (SD = 0.11). W2 
population had also the highest allelic richness. Allelic 
richness and private allelic richness over all pairs of pop- 
ulations and loci were significant (p < 0.05). Ml popula- 
tion had the highest (Rp = 0.83) and Gl had the lowest 
private allelic richness (Rp = 0.11). Wello as a region of 
collection supported the highest gene diversity (He = 
0.70) whereas Gibe was the lowest (He = 0.40), but values 
were significant between Gibe and Wello (Z = -3.06, P = 
0.002), and between Metekel and Wello (Z = -2.35, p = 
0.01). Allelic richness was the lowest in Gibe (Rs = 3.9) 
and the highest in Wello (Rs = 6.8), but differences were 
significant between Gibe and Metekel (Z = -2.13, p = 
0.03) and between Gibe and Wello (Z=-2.82, p = 
0.006), but not significant between Metekel and Wello 
(Z=-1.78, p = 0.08). Similarly, private allelic richness 
was significant between Gibe and Metelkel (Z = -2.5, 
p = 0.01) and between Gibe and Wello (Z = -2.67, p = 
0.008), but not significant between Metekel and Wello 
(Z = -0.71, p = 0.48). 
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Ml 
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M2 


»■ 


Wl 


< 


W2 


W 
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-1 
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Figure 2 PCA biplot shiowing the distribution of the 160 sorghum samples based on their measured phenotypes. 
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Table 3 Diversity indices of the SSR loci used in the study 
(Na = observed number of alleles; = effective number 
of alleles; = Allelic richness; Hq = average observed 
heterozygosity; He = Expected heterozygosity/gene 
diversity; PIC = polymorphism information content; 



CCD 

obn lOCI 


Availability 




A 


D 

"s 


Mo 


Me 




Sb5-206 


1 00 


14 


44 


7 71 


0 1 5 


0 78 


0 75 


Sb1-1 


1 00 


27 


7 5 


1 0 00 


0 1 5 


0 87 


0 85 


Sb6-34 


1.00 


8 


2.2 


4.12 


0.1 


0.54 


0.49 


Sb5-256 


0.94 


3 


2.5 


3.00 


0.33 


0.60 


0.53 


Sb4-72 


1.00 


9 


3.0 


4.92 


0.12 


0.66 


0.61 


Sb6-84 


0.99 


12 


3.9 


6.93 


0.12 


0.75 


0.73 


Sb4-121 


0.92 


8 


3.0 


5.07 


0.04 


0.67 


0.61 


Sb6-342 


0.94 


10 


3.9 


5.35 


0.09 


0.75 


0.69 


Sb4-15 


0.93 


7 


1.7 


3.95 


0.12 


0.42 


0.38 


Sb5-236 


0.96 


9 


3.1 


5.28 


0.17 


0.68 


0.63 


Sb6-57 


0.95 


8 


2.3 


3.57 


0.11 


0.57 


0.48 


SBKAFGKl 


0.97 


8 


3.2 


3.80 


0.07 


0.69 


0.62 


Overall mean 


0.97 


10.25 


34 


5.31 


0.13 


0.66 


0.61 



Population genetic structure and gene flow 

AMOVA showed 54.44% of the variation to be within 
populations, 32.76% among populations within regions, 
and 12.8% among the regions of origin (Fsx = 0.40, p < 
0.001) (Table 5). Pair wise Fst values among all popula- 
tions were significant (p < 0.001) (Table 6). The diver- 
gence among the regions of collection was also high 
(Fst = 0.21, p = 0.02). The Neighbor-joining dendrogram 
grouped the 160 individuals of the eight populations into 
three major clusters (Figure 3). Accordingly, Cluster I 
consisted of individuals from the improved early matur- 
ing variety, 76 Tl#23 (Wello-2 population). Cluster II 
joined the two populations from Metekel (Metekel-1 
and Metekel-2), a population from Wello (Wello-1) and 

Table 4 Summary of the population diversity indices 
averaged over the 12 loci (Na = number of alleles per 
polymorphic locus, Ap = number of private alleles, 
Rs = allelic richness, Hq = average observed heterozygosity. 
He = gene diversity) 



Population 


Na 


Ap 




H„ 


He 


Gibe-1 


2.1 


1 


1.88 


0.10 


0.13 


Gibe-2 


3.4 


2 


3.07 


0.21 


0.46 


Metekel-1 


4.6 


12 


3.84 


0.12 


0.42 


Metekel-2 


4.3 


7 


3.98 


0.21 


0.50 


Wello-1 


5.2 


9 


2.87 


0.20 


0.69 


Wello-2 


3.1 


7 


4.93 


0.07 


0.36 


Wello-3 


3.1 


2 


3.06 


0.10 


0.40 


Wello-4 


3.1 


7 


3.14 


0.06 


0.34 



Gibe-1 population. The third cluster (cluster III) con- 
sisted of individuals from the two adjacent Wello popu- 
lations (Wello-3 and Wello-4) and Gibe-2 population. 
This pattern of clustering was also similar to the princi- 
pal component biplot (Figure 4). Evanno et al. (2005) 
method on STRUCTURE outputs predicted K = 2 to be 
the most likely number of clusters (Figure 5). STRUC- 
TURE with and without prior information on the popu- 
lations gave similar clustering (K = 2). With no prior 
information, 73 (46%) of the total 160 individual plants 
were grouped in cluster I with >0.90 probability of mem- 
bership whereas 71 (44%) of them were grouped in cluster 
II with the same probability of membership. Assigning the 
site of collection as the putative population origin for each 
individual (with prior information) resulted in exactiy the 
same result as above (Figure 6). In such a case, both clus- 
ters contained 6 to 20 members of five populations each 
with >0.90 coefficient of ancestry. All of the 20 (100%) 
individuals of each of Metekel-1 and Gibe-1 populations, 
and 17 (85%) of individuals of Metekel-2 population were 
grouped in Cluster I (Additional file 2: Table S2). The 
number of migrants per generation as an indirect estimate 
of gene flow was very low (N^ = 0.38) in the overall popu- 
lations. However, gene flow as high as Nm = 3.66 was ob- 
served in the adjacent Metekel populations (Ml and M2). 

Mantel test for the correlation between Rousset's gen- 
etic distance and the geographic distance matrices was 
weak, but significant (r = 0.272, p = 0.020). Moreover, the 
reduced major axis (RMA) regression showed a signifi- 
cant relationship with an intercept (-0.2936 ± SE0.2290, 
1000 bootstraps over individual pairs), slope (0.003 ± 
SE0.0006) and with coefficient of determination (R^ = 
0.074) (Figure 7). 

Discussion 

Diversity of quantitative phenotypic traits 

It is well known that the majority of the Ethiopian sor- 
ghum landraces are characterized by high biomass (tall 
height, large leaf area and large number of leaves). All of 
the populations included in the present study displayed 
such characters except the only improved exotic variety, 
76 Tl#23 which showed parameters deviated from 
such measurements. For instance, all except Wello-2 
and Wello-3 populations exhibited average height between 
300 and 360 cm. Wello-3 (Jigurte) population is relatively 
shorter and earlier maturing than the high yielding and 
previously the dominant cultivar called Degalit (Wello-4). 
Jigurte is now becoming the dominant cultivar in Kobo- 
Alamata plain due to its earlier maturity than the other 
landraces and its better suitability to the changing climate, 
mainly to unreliable rainfall in the region. Some farmers 
call it as "America" perhaps because it was introduced 
from another place decades ago. The observed high vari- 
ation in the range of the quantitative phenotypic traits in 
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Table 5 Analysis of molecular variance (AMOVA) among the sorghum regions of collection, among the populations 
within geographical regions, and within sorghum landrace populations 



Source of variation 


d.f. 


Sum of squares 


Variance components 


Percentage of variation 


p value 


Among geographical regions of collection 


2 


141 


0.341 


12.8 


<0.001 


Among populations within geographical regions 


5 


181.88 


0.873 


32.76 


<0.001 


Within populations 


312 


452.68 


1.451 


54.44 


<0.001 


Total 


319 


775.56 


2.665 







all populations could have genetic basis or it could be due 
to phenotypic plasticity. If it is due to the latter, it could 
be due to the differences in the rainfall and temperature 
as there was little variation in altitude of the collection 
sites (1088-1500 m) to bring about such changes. Ayana 
et al. (2000) studied the geographical pattern of quantita- 
tive phenotypic traits in Ethiopian and Eritrean sorghum 
gene bank accessions. They found that the variation 
within and among geographical regions was high and they 
suggested that gradients of growing period, rainfall and 
temperature are more important for such variations and 
should be considered during future germplasm collection. 

SSR polymorphism and genetic diversity 

The observed SSR fragment sizes were within the range 
of the sizes in the previously published reports in sor- 
ghum (Brown et al. 1996; Dean et al. 1999; Ghebru et al. 
2002; Agrama and Tuinstra 2003; Abu-Assar et al. 
2005). These set of primer pairs are highly polymorphic 
and are being used for genetic finger printing as well as 
marker assisted breeding programs. 

Although there was no comparison of the present in 
situ germplasm set with historical gene bank accessions, 
the high genetic diversity observed in Wello populations 
compared to the populations from other regions may in- 
dicate that the sorghum genetic diversity in this region 
is still in a good situation. This may show that farmers 
even during harsh drought seasons can conserve their 
landraces. However, this does not show the changes in 
the historical genetic diversity in the region. The highest 
diversity in Wello-2 population representing an exotic 
improved variety (76 Tl#23) was rather unexpected. 



Table 6 Pair wise Fst matrix, a measure of population 
divergence among the sorghum landrace populations 
(all pairs were significant, p< 0.001) 





G2 


IVI1 


M2 


W1 


W2 


W3 


W4 


Gl 


0.396 














G2 


0.440 


0.324 












Ml 


0.327 


0.223 


0.061 










M2 


0.350 


0.201 


0.176 


0.100 








W1 


0.651 


0.431 


0.500 


0.405 


0.308 






W2 


0.692 


0.430 


0.471 


0.430 


0.295 


0.547 




W3 


0.694 


0.345 


0.484 


0.445 


0357 


0.565 


0.389 



This variety was released in 1976 and its distribution to 
farmers all over the country has also long history. Thus, 
it might be because the variety was mixed with the land- 
races and lost its genetic purity. As a region of collec- 
tion. Gibe populations were found to have the lowest 
diversity and Wello populations the highest in terms of 
allelic richness and number of private alleles. The signifi- 
cantly lower genetic diversity indices of the Gibe and 
Metekel populations than those of the Wello popula- 
tions may indicate some level of genetic drift (but SSR 
analysis did not confirm this) during sampling of the 
seeds by migrants during settlement. Farmers usually 
carry few heads when they migrate and these may repre- 
sent few genotypes only. Gibe and Metekel areas had no 
history of sorghum production before settlement. 

The extent of the gene diversity of the studied Ethiopian 
sorghum populations (Hg = 0.66) was similar to Kenyan 
sorghum accessions (He = 0.66) (Ngugi and Onyango, 
2012), Niger sorghum (He = 0.613) (Deu et al. (2008) and 
Eritrea sorghum (He = 0.65) (Ghebru et al. 2002), but 
higher than Morocco sorghum (He = 0.29) (Dje et al. 
2000) and (He = 0.32) (Barnaud et al. 2007) using similar 
SSR markers. However, comparisons were not fair as the 
number of samples and the sampling strategy were differ- 
ent. Similarly, the observed heterozygosity (Hq = 0.13) was 
comparable to what was observed in Dje et al. (2000) 
(Ho = 0.134) and a bit higher than in Barnaud et al. 
(2007) (Ho = 0.11), but much higher than the result of 
Deu et al. (2008) (0.042). Although there was no signifi- 
cant departure from HWE, the observed heterozygosity 
was much lower than the expected hetrozygosity/gene 
diversity. In congruence with this study, Nybom (2004) 
compiled 79 microsatellite based studies and found that 
grand means for Ho was lower than He in 64 of these 
studies. Similarly, most of the genetic diversity studies in 
sorghum using SSRs (e.g., Ghebru et al. 2002; Barro- 
Kondombo et al. 2010; Deu et al. 2010; Ngugi and 
Onyango 2012) supported this finding. 

Cuevas and Prom (2013) studied the genetic diversity 
and population structure of 137 sorghum accessions of 
Ethiopian origin preserved at USDA-ARS National Plant 
Germplasm System (NPGS) using 20 SSRs and found 
observed and expected heterozygosity of 0.23 and 0.78, 
respectively. These figures are higher than our findings. 
Even though ex situ accessions can sometimes experience 
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Figure 3 Neighbor-joining radial tree showing the clustering pattern of individual samples from the eight sorghum landrace populations. 



loss of variability associated with missing of low frequency 
alleles (<1%) during repeated regeneration (e.g., Wilkes, 
1989; Adugna et al. 2013), the major difference in the di- 
versity of the present study and Cuevas and Prom (2013) 
was perhaps in the sampling strategy including the sam- 
pling area and period. Sorghum grows almost everywhere 
in Ethiopia between altitude range of 500 and 2400 m. 
However, our sampling mainly focused only on three geo- 
graphical regions and individual plant sample collections 
were done in 3-5 farmers' fields. Cuevas and Prom (2013) 



mentioned that most of the accessions they used had no 
passport data and thus there is a possibility that they could 
be countrywide collections. It is possible that including 
more locations may increase the chance of getting higher 
diversity. 

Population structure and gene flow 

Some populations of landraces with different folk names 
like Degalit and Jigurte from Wello, which are morpho- 
logically different, were not found to be distinct using 
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Figure 4 Principal component (PCA) biplot of the 160 sorghum samples based on correlation of SSR allele frequencies. 
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Figure 5 A biplot detected the maximum peal< at K = 2 (the optimum number of clusters) based on Evanno et al. (2005) prediction. 



SSRs. This could be attributed to different reasons. First, 
they may not be genetically distinct from each other in 
which case the morphological differences between these 
cultivars may have little genetic basis; instead it could be 
due to farmers' directional selection for different mor- 
phological traits for different purposes. Another reason 
could be that the observed morphological differences 
might not be detected using neutral genetic markers. 
Similar observations were made in Mali, Guinea and 
Kenya that varieties defined as different, based on their 
vernacular/ folk names or collection sites were in fact 
very closely related using SSR markers (Sagnard et al. 
2011; Labeyrie et al. 2014). Collections were made in 
Metekel in settlement villages and Gibe valley composed 
mainly of people from Wello. Therefore, as expected, clus- 
tering together of Metekel- 1 and Metekel-2 populations 



with Wello- 1 and Gibe-1 populations might be due to 
long distance seed movement with settlers. Surpris- 
ingly, Gibe populations displayed more affiliation with 
Wello and Metekel populations than within them- 
selves. High gene flow was also observed between Wello- 1 
and Metekel-2 (Nm = 2.25) and between Wello- 1 and 
Metekel-1 (Nm = 1.14) populations as that of the gene 
flow between the adjacent Metekel-1 and Metekel-2 popu- 
lations (Nm = 3.66). However, gene flow in the overall 
populations was very low, which was contradictory to the 
ex situ accessions of Ethiopian origin conserved at USDA- 
ARS (Cuevas and Prom 2013). 

Mantel test for the correlation between Rousset's gen- 
etic distance and the geographic distance matrices shows 
only the significance of the relationship; hence, slope 
and intercept of this relationship should be done using 
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Figure 6 STRUCTURE bar graphs of the 160 individual sorghum plant samples in eight pre-determined populations (x-axis) at K = 2. 

Figures in tlie y-axis show coefficient of membersliip/assignment. 
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Figure 7 RMA regression of Rousset's genetic distance matrix plotted against the geographic distance (Km) matrix of the sorghum 
landrace collection sites in Ethiopia (/=- 0.2936X+ 0.00343; R^ = 0.074; p < 0.001). 



regression techniques (Bohonak 2002). Among the re- 
gression techniques, reduced major axis (RMA) is re- 
portedly better for analysis of isolation by distance 
(Hellberg 1994). Hence, computation of intercept and 
slope of genetic distance of the sorghum landraces and 
the geographic distance of the collection sites using 
RMA regression resulted in a weak, but significant rela- 
tionship with slope (0.003 ± SE0.0006). This shows that 
gene flow among populations follows a trend of isolation 
by distance (IBD) in a two dimensional stepping stone 
model, which indicates that the farthest the populations 
are located the weakest are their relationships. However, 
long distance seed movement as it has already happened 
from Wello area to Metekel by settlers could be the 
major force that played a major role in shaping the gen- 
etic structure of the landrace populations. Thus, we be- 
lieve that the pattern of the population genetic structure 
of the studied landraces was strongly influenced by 
human migration with evidence from Figure 3. It is 
known that this pattern of genetic structure still observed 
today is the result of the history of domestication and 
human migrations (Sagnard et al. 2011). 

Implications for crop improvement and genetic resources 
conservation 

The importance of crop diversity to counteract genetic 
vulnerability and how plant breeding, plant variety 



legislation, and an expanding seed industry may influence 
genetic diversity is well discussed elsewhere (e.g.. Brown 
1983). It has been argued that due to recurrent drought 
occurring in some of the major sorghum growing regions 
of the country, the diversity of the crop is declining over 
time and farmers in the dry lowlands tend to use high 
yielding improved early maturing sorghum varieties or 
shift their production systems to more vulnerable and low 
yielding early maturing crop species such as tef {Eragrostis 
tef), the dominant cereal, which might have resulted in 
genetic erosion of the sorghum landraces in these regions. 
The adoption of early maturing improved varieties was 
also found to be high in two of such areas affected by re- 
current drought, Kobo in the North Eastern and Mieso in 
the Eastern parts of the country (Bekele et al. 2013). How- 
ever, inadequate supply of seeds and lack of promotion 
impede the improved varieties to spread further in other 
regions of the country. Seed supply in the later regions is 
inadequate partly because the farmers decide to plant the 
seed of improved sorghum varieties late in the season only 
when the seed of their landraces fail to emerge. At this 
time, there is no possibility of getting improved seed 
except some kUos from the research institutes for some 
farmers for testing. During planting the improved seed, 
they usually do not remove the remnant plants of their 
landraces from the previous planting. As a result, the har- 
vest from such fields does not ensure quality seed for the 
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farmers for the next season planting. Due to this reason 
and lack of isolation from the neighboring fields, the im- 
proved seeds could not usually be used for more than two 
cycles. Another scenario for the lack of widespread adop- 
tion of improved varieties in the majority of the regions is 
the subsistence nature of poor sorghum farmers' lives. 
They cannot afford to buy seeds of improved varieties. Be- 
cause farmers harvest only from improved varieties, which 
are usually planted in small plots of land, no matter how 
much they love them, they consume what they have har- 
vested. Thus, they will not get seed for the coming season 
and the problem persists. Unlike the improved seeds, 
seeds of the landraces can be shared freely, exchanged in 
kind or purchased from market at any time of the year. 

In some areas like the extreme North Wello (where 
we collected Wello-2, 3, and 4 populations) once up on 
a time sorghum was the dominant crop and highly di- 
verse. At present, however, tef is the dominant crop spe- 
cies in this area and wherever sorghum is growing, only 
few representative cultivars, which could go with the 
changing climate, are dominating. Shewayrga et al. (2012) 
proved loss of diversity in sorghum landraces in this 
region of Wello through comparing historical accessions 
preserved in gene banks for 30 years (originally collected 
in 1973) with in situ collections (newly collected in 2003). 

The current Ethiopian sorghum germplasm holdings 
at the Ethiopian Institute Biodiversity (EIB) reached 9432 
(http://www.ibc.gov.et/biodiversity/conservation/database- 
ms). This number is very small compared to the germ- 
plasm preserved elsewhere. For instance, more than 7000 
germplasm accessions of Ethiopian origin are preserved at 
the US National Plant Germplasm System (Erpelding and 
Prom 2009) and another 4500 at ICRISAT genebank (IBC 
Institute of Biodiversity Conservation 2007). Moreover, 
even though the Ethiopian germplasm has been serving 
the world as source of valuable genes or for direct culti- 
vation, the Ethiopian research system has not yet fully uti- 
lized these resources. As a result, there has been little 
success in breeding farmers' preferred sorghum varieties 
in Ethiopia due to the mismatch between farmers' prefer- 
ence and the breeders' criteria for selection. Over the past 
4 decades more than 40 varieties have been released for 
the different agroecologies except for the wet lowlands of 
Ethiopia including Metekel zone, which was covered by 
this study. However, none of the released varieties has 
been able to widely taken up by the farmers. Because the 
wet lowlands combine high moisture (humidity) and high 
temperature, they are suitable for the development of vari- 
ous fungal leaf and head diseases those attack sorghum. 
The breeding program has been almost exclusively 
dependent on introduced germplasm, which are short in 
height and early in maturity and little attention has been 
given to the landraces. However, there are landraces well 
adapted to the various sorghum growing environments 



due to co-evolution with the changing climate, insect 
pests, striga (the parasitic weed), and pathogens of the 
common diseases. Of course some of these landraces have 
limitations of poor grain quality and extended maturity of 
as long as nine months. On the other hand, better quality 
sorghum landraces are also found. Therefore, future sor- 
ghum improvement should focus on improving the land- 
races. For instance in the present study, crossing of the 
distinct Wello populations (Wello-3 and Wello-4) with 
some of the remaining populations included in this study 
may result in good combination for selection of progenies 
with desirable characteristics to be used as varieties in the 
wet lowlands as they are genetically distant from one 
another. Moreover, reintroduction of ex situ germplasm to 
their original places of collection which are now dis- 
possessing the diversity may help to revitalize the lost 
diversity. 
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